14 research outputs found

    Generalized Clusterwise Regression for Simultaneous Estimation of Optimal Pavement Clusters and Performance Models

    Full text link
    The existing state-of-the-art approach of Clusterwise Regression (CR) to estimate pavement performance models (PPMs) pre-specifies explanatory variables without testing their significance; as an input, this approach requires the number of clusters for a given data set. Time-consuming ‘trial and error’ methods are required to determine the optimal number of clusters. A common objective function is the minimization of the total sum of squared errors (SSE). Given that SSE decreases monotonically as a function of the number of clusters, the optimal number of clusters with minimum SSE always is the total number of data points. Hence, the minimization of SSE is not the best objective function to seek for an optimal number of clusters. In previous studies, the PPMs were restricted to be either linear or nonlinear, irrespective of which functional form provided the best results. The existing mathematical programming formulations did not include constraints that ensured the minimum number of observations required in each cluster to achieve statistical significance. In addition, a pavement sample could be associated with multiple performance models. Hence, additional modeling was required to combine the results from multiple models. To address all these limitations, this research proposes a generalized CR that simultaneously 1) finds the optimal number of pavement clusters, 2) assigns pavement samples into clusters, 3) estimates the coefficients of cluster-specific explanatory variables, and 4) determines the best functional form between linear and nonlinear models. Linear and nonlinear functional forms were investigated to select the best model specification. A mixed-integer nonlinear mathematical program was formulated with the Bayesian Information Criteria (BIC) as the objective function. The advantage of using BIC is that it penalizes for including additional parameters (i.e., number of clusters and/or explanatory variables). Hence, the optimal CR models provided a balance between goodness of fit and model complexity. In addition, the search process for the best model specification using BIC has the property of consistency, which asymptotically selects this model with a probability of ‘1’. Comprehensive solution algorithms – Simulated Annealing coupled with Ordinary Least Squares for linear models and All Subsets Regression for nonlinear models – were implemented to solve the proposed mathematical problem. The algorithms selected the best model specification for each cluster after exploring all possible combinations of potentially significant explanatory variables. Potential multicollinearity issues were investigated and addressed as required. Variables identified as significant explanatory variables were average daily traffic, pavement age, rut depth along the pavement, annual average precipitation and minimum temperature, road functional class, prioritization category, and the number of lanes. All these variables were considered in the literature as the most critical factors for pavement deterioration. In addition, the predictive capability of the estimated models was investigated. The results showed that the models were robust without any overfitting issues, and provided small prediction errors. The models developed using the proposed approach provided superior explanatory power compared to those that were developed using the existing state-of-the-art approach of clusterwise regression. In particular, for the data set used in this research, nonlinear models provided better explanatory power than did the linear models. As expected, the results illustrated that different clusters might require different explanatory variables and associated coefficients. Similarly, determining the optimal number of clusters while estimating the corresponding PPMs contributed significantly to reduce the estimation error

    Effects on compliance of a HAWK signal in Las Vegas

    Full text link
    In 2010, 806 crashes involving pedestrians occurred in Nevada; 36 were fatalities and 796 were injuries. Although numerous pedestrian safety countermeasures exist in Las Vegas, NV it was ranked as the 6th most dangerous large metropolitan area in the U.S. So, additional and more effective safety countermeasures were required to reduce pedestrian crashes in Las Vegas. High-intensity Activated crossWalK (HAWK) signal has been identified as a potential mechanism to reduce crashes. This study evaluates the effectiveness of such signal installed at E. Sahara Avenue, Las Vegas. Data was collected from videos captured by two cameras facing eastbound and westbound for two weeks; one week each for before and after operation of the signal. Statistical analyses (descriptive analysis and t-test) were performed considering different performance measures such as pedestrian waiting time at the curb. On an average, jaywalking occurrences dropped significantly from 32.6% to 8.2% and the total crossing time decreased by 5.3 seconds. In addition, motorist compliance, yielding to pedestrians attempting to cross the street, improved with 6.9% fewer non-yielding vehicles

    Estimation of optimal pavement performance models for highways

    No full text
    A mathematical program is proposed to determine an optimum number of pavement clusters, memberships of the pavement samples to clusters, and associated significant explanatory variables. Simulated annealing and all subsets regression was used to solve the mathematical program. Potential multicollinearity issues were exam-ined and addressed. All possible combinations of the explanatory variables were explored to select the best model specification. Six-cluster models were determined to be the optimum solution for the dataset used in this research. The resultant models were applied to the test data set to examine the prediction accuracy. Nor-malized root-mean-square error was calculated for each of the resultant models. The associated models were robust with small prediction errors

    Business intelligence for transportation and infrastructure systems

    No full text
    This study illustrates the advantage of using a business intelligence (BI) approach for the analysis and processing of transportation and infrastructure data. As a case study, a data warehouse, interactive dashboards including maps, and advanced analytics were created for data from the Pavement Management System (PMS) of the Nevada Department of Transportation (NDOT). The combination of all these capabilities in one single platform enables to maximize the value of the available data

    Comprehensive clusterwise linear regression for pavement management systems

    No full text
    A comprehensive mathematical program was formulated to determine simultaneously (1) an optimum number of pavement clusters, (2) cluster memberships of pavement samples, (3) cluster-specific significant explanatory variables, and (4) estimated regression coefficients for pavement performance models (PPMs). Simulated annealing coupled with all-subset regression was proposed to solve the mathematical programming. The proposed algorithm was capable of identifying and addressing potential multicollinearity issues. All possible combinations of the explanatory variables were examined to select the best model that provided a balance among (1) the number of PPMs; (2) the number of explanatory variables; (3) the resources required to develop, maintain, and use these models; and (4) the explanatory power. For the data set used in this research, six-cluster models were determined as part of the optimum solution. The predictive capabilities of the resultant models were investigated, and results showed that the models provided few prediction errors without any overfitting issues

    Limitations of existing pavement deterioration models and a potential solution

    No full text
    The state of the art currently for addressing pavement deterioration proposes the development of Pavement Deterioration Models, using a clusterwise approach that requires a priori knowledge of the optimal number of clusters as well as significant explanatory variables. In addition, the objective function used to solve the clusterwise problem is the minimization of the sum of squared errors, which always decreases with additional cluster(s) and/or explanatory variable(s). To address these limitations, a mathematical programming framework is proposed based on the Bayesian Information Criterion, which does not require a priori information about the optimal number of clusters. An extensive optimization approach was used to find a solution to the proposed mathematical program, and issues associated with overfitting were investigated. Results using data from the entire State of Nevada illustrate the advantage of the proposed framework

    Generalised clusterwise regression for simultaneous estimation of optimal pavement clusters and performance models

    No full text
    This paper focuses on clusterwise regression (CR) approach for modelling of pavement performance. CR simultaneously clusters the data and estimates the associated models. Previous studies using CR approach have a few limitations: (1) the explanatory power of variables used in the analyses was not tested; (2) the approach could not find the optimal number of clusters; (3) the objective function was to minimise the sum of squared errors, which is not the best to seek for the optimal number of clusters; (4) the model functional form was restricted to be either linear or nonlinear. To address these limitations, this paper proposes a generalised mathematical programme and solution algorithm within the CR framework. Bayesian Information Criteria was used as the objective function. The proposed approach explored all possible combinations of potential significant explanatory variables to select the best model specification. The potential multicollinearity issues in the models were addressed if required. Both linear and nonlinear functional forms were estimated using a large dataset in Nevada. Predictive accuracy of the resultant models was evaluated using root-mean-square error (RMSE), normalised RMSE, and mean absolute errors. The results showed that the nonlinear models were more accurate than the linear models in estimating present serviceability index

    Evaluation of the effectiveness of a HAWK signal on compliance in Las Vegas Nevada

    No full text
    There is a continuous large number of crashes involving pedestrians in Nevada despite the numerous safety mechanisms currently used at roadway crossings. Hence, additional as well as more effective mechanisms are required to reduce crashes in Las Vegas, in particular, and Nevada in general. A potential mechanism to reduce conflicts between pedestrians and vehicles is a High-intensity Activated crossWalK (HAWK) signal. This study evaluates the effects of such signals at a particular site in Las Vegas. Video data were collected using two cameras, facing the eastbound and westbound traffic. One week of video data before and after the deployment of the signal were collected to capture the behavior of both pedestrians and drivers. T-test analyses of pedestrian waiting time at the curb, curb-to-curb crossing time, total crossing time, jaywalking events, and near-crash events show that the HAWK system provides significant benefits

    A clusterwise regression approach for the estimation of crash frequencies

    No full text
    In the current literature, data is aggregated for the estimation of functions to explain or predict crash patterns using either clustering analysis, regression analysis, or stage-wise models. Typically, analysis sites are grouped into site subtypes based on predefined characteristics. The assumption is that sites within each subtype experience similar crash patterns as a function of prespecified explanatory characteristics. To develop functions to estimate crashes, all data points are clustered only as a function of associated site characteristics. As a consequence, estimated parameters may be based on different crash patterns that represents various trends that could be better captured by using multiple functions. To address this limitation, this study proposes a mathematical program utilizing clusterwise regression to assign sites to clusters and simultaneously seek sets of parameter values for the corresponding estimation functions, so as to maximize the probability of observing the available data. A simulated annealing, coupled with maximum likelihood estimation, was used to solve the mathematical program. Results were analyzed for two site subtypes with fatal and all injury crashes: (1) roadway segments for urban multilane divided segments, and; (2) urban four-leg signalized intersections. Clusterwise regression improved the predicted number of crashes with multiple estimation functions within the same site subtype
    corecore